-
Notifications
You must be signed in to change notification settings - Fork 139
Implement VAES AVX and AVX512 backends for aes #482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ae7ec9c
to
93e30c4
Compare
@tarcieri |
I think making VAES opt-in via |
In that case, I think we should release aes v0.9.0 without VAES support and shortly after release aes v0.9.1 with intrinsics-based VAES support. This would allow users who don't use the latest stable to use v0.9.0 thanks to the MSRV-aware resolver. |
I'm not sure how long it will take to stabilize all the relevant intrinsics. If they're through FCP, hopefully soon? If they take awhile we can polyfill them with |
The intrinsics (e.g. |
aes/Cargo.toml
Outdated
@@ -31,7 +31,7 @@ hazmat = [] # Expose cryptographically hazardous APIs | |||
|
|||
[lints.rust.unexpected_cfgs] | |||
level = "warn" | |||
check-cfg = ["cfg(aes_compact)", "cfg(aes_force_soft)"] | |||
check-cfg = ["cfg(aes_compact)", "cfg(aes_force_soft)", "cfg(avx256_disable)", "cfg(avx512_disable)"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps namespace these configs under aes_*
like the others?
check-cfg = ["cfg(aes_compact)", "cfg(aes_force_soft)", "cfg(avx256_disable)", "cfg(avx512_disable)"] | |
check-cfg = ["cfg(aes_compact)", "cfg(aes_force_soft)", "cfg(aes_avx256_disable)", "cfg(aes_avx512_disable)"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, this would be handled with "negative" target features, but alas. :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also may be worth to name it just aes_disable_vaes
. It does not look like we have different target features for VAES256 and VAES512 intrinsiscs, so it may be worth to just remove the vaes256
backend.
UPD: Ah, VAES512 intrinsics are also gated on avx512f
. Though I wonder if there are CPUs which support VAES256, but not VAES512.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into it and it looks like some CPU families support VAES256, but not VAES512 (e.g. Zen 3).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also may be worth to name it just
aes_disable_vaes
. It does not look like we have different target features for VAES256 and VAES512 intrinsiscs, so it may be worth to just remove thevaes256
backend.
Intel started disabling AVX512 on Alder Lake CPUs if I remember correctly and had at one point stated in their roadmap that they intended not to include it on newer consumer oriented CPUs. That was another reason I included VAES256.
It sounds like maybe Intel's plans have changed though: https://www.phoronix.com/news/Intel-AVX10-Drops-256-Bit
But I also use the VAES256 backend as a fallback for encoding tail blocks:
Lines 534 to 569 in fb85281
#[inline] | |
fn encrypt_tail_blocks(&self, blocks: InOutBuf<'_, '_, Block>) { | |
let backend = self; | |
let mut rem = blocks.len(); | |
let (mut iptr, mut optr) = blocks.into_raw(); | |
let backend = &$name_backend::Vaes256::from(backend); | |
if backend.features.has_vaes256() { | |
while rem >= backend.par_blocks() { | |
let blocks = unsafe { InOut::from_raw(iptr.cast(), optr.cast()) }; | |
backend.encrypt_par_blocks(blocks); | |
rem -= backend.par_blocks(); | |
iptr = unsafe { iptr.add(backend.par_blocks()) }; | |
optr = unsafe { optr.add(backend.par_blocks()) }; | |
} | |
} | |
let backend = &$name_backend::Ni::from(backend); | |
while rem >= backend.par_blocks() { | |
let blocks = unsafe { InOut::from_raw(iptr.cast(), optr.cast()) }; | |
backend.encrypt_par_blocks(blocks); | |
rem -= backend.par_blocks(); | |
iptr = unsafe { iptr.add(backend.par_blocks()) }; | |
optr = unsafe { optr.add(backend.par_blocks()) }; | |
} | |
while rem > 0 { | |
let block = unsafe { InOut::from_raw(iptr, optr) }; | |
backend.encrypt_block(block); | |
rem -= 1; | |
iptr = unsafe { iptr.add(1) }; | |
optr = unsafe { optr.add(1) }; | |
} | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed the config flags with the aes_
prefix as suggested.
Just so we understand each other, I have no intention of refactoring this code further, especially not a full rewrite using intriniscs, without first seeing:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just merge this and address the remaining problems in followup PRs
I think that would be a good approach. I'm willing to help maintain and continue to improve the code if it's going to be used. |
@newpavlov does that sound good to you? Edit: since it's |
Personally, I would prefer to have a proper intrinsics-based implementation released in v0.9.1, but if there is a desire to have an AVX-512 support in v0.9.0, I am fine with merging this. Although, I believe that in the latter case it should be an experimental Nighlty-only backend gated behind a |
First, thanks for finally merging the PR.
That design has been in place since the original PR, which is a long time ago now. You could have reviewed the design, performed your own benchmarks, and proposed a better alternative. I would have welcomed feedback on a better design. Unless the cipher API is redesigned to separate the backends, I don't see how you avoid a compromise. If you always store the broadcast keys, you get a very measurable slowdown for the non-VAES case which I showed in the previous thread. If you rebroadcast for every chunk of blocks for VAES, you also get a slowdown from unnecessary work. Caching the keys on demand seems to alleviate those performance hits in the benchmarks.
You could have asked me to enable the option for maintainer commits if you had changes you wanted to make. I would have expected we would have discussed those in a review first though. |
Now that we have the core design merged we can experiment with those details without them blocking the PR |
Here is an updated implementation of VAES256 and VAES512 support for
aes
.Benchmarks are from a Ryzen 9950X3D.
VAES512
VAES256
AES-NI